#vision encoder09/05/2025
X-Fusion: Enhancing Frozen Language Models with Vision Without Sacrificing Language Skills
X-Fusion introduces a dual-tower architecture that adds vision capabilities to frozen large language models, preserving their language skills while improving multimodal performance in image understanding and generation.